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ABSTRACT 



Linear controls are a well known simple technique for achieving variance 
reduction in computer simulation. Unfortunately the effectiveness of a linear 
control depends upon the correlation between the statistic of interest and the 
control, which is often low. Since statistics often have a nonlinear relation- 
ship with the potential control variables, nonlinear controls offer a means for 
improvement over linear controls. This paper focuses on the use of nonlin- 
ear controls for reducing the variance of quantile estimates in simulation. It is 
shown that one can substantially reduce the analytic effort required to develop a 
nonlinear control from a quantile estimator by using a strictly monotone trans- 
formation to create the nonlinear control. It is also shown that as one increases 
the sample size for the quantile estimator, the asymptotic multivariate normal 
distribution of the quantile of interest and the control reduces the effectiveness 
of the nonlinear control to that of the linear control. However, the data has 
to be sectioned to obtain an estimate of the variance of the controlled quantile 
estimate. Graphical methods are suggested for selecting the section size that 
maximizes the effectiveness of the nonlinear control. 



1 OUTLINE OF THE PAPER 

The paper begins with a short discussion of quantiles and the properties of a 
quantile estimator, with emphasis on the need for a reliable estimator for the vari- 
ance of the quantile estimator. The next part of the paper discusses linear controls 
for quantile estimates and the subtleties involved with estimating the coefficients 
for the control functions. The discussion of linear controls is followed by a discus- 
sion of nonlinear controls and their application to reducing the variance of quantile 



estimates for a fixed simulation sample size. The final part of the paper presents an 
extract of results from a simulation experiment where crude, linearly controlled and 
nonlinearly controlled estimators are compared. Throughout the paper the empha- 
sis is on quantile estimation for continuous random variables, though other cases 
are of interest. 



2 QUANTILES 

2.1 Properties of a Quantile Estimator 

Let y be a random variable with a right-continuous distribution function defined 

by 

Py(y) = Pr {y < y} , -OO < y < oo. 

Following Serfling (1980) define the a quantile of Y, ya, for 0 < a < 1, as the value 

Fy\a) = \n{{y.Fyiy)>a}. (1) 

If Fy(y) is strictly increasing, is unique for each a. Additional restrictions 
on Fy{y)^ such as continuity at may be needed for the existence of certain 
asymptotic properties and will be stated as required. 

Given a simulation sample of n independent and identically distributed (i.i.d.) 
samples of F, namely Fi, . . . ,F„, one can construct a sample distribution function, 
Fn^ by placing at each observation Ti, a mass 1/n. Thus F^ may be represented as 

Fniy) = - I(Pi < y)i -oo < y < oo 

where I(-) is an indicator function which returns 1 if the argument is true and 0 
otherwise. 

For a sample of size n, one can define a nonparametric estimator of the a quantile, 
the sample a quantile of the sample distribution function, or 

ilain) = 



Using the sample a quantile to estimate ya is equivalent to using the order statistics 
of the sample, F(i) <,...,< F(^j, and defining a nonparametric estimator of the 
a quantile, yain), as in Lewis and Orav (1989), as 



j ^(na) if is an integer 

\ ^([naj + i) if is not an integer 



( 2 ) 



where [tnj denotes the integral part of w. 

For a given n and a, yai'fi) is the rth order statistic from the n-sized sample 
where r is determined as in (2). The following results on the distribution of ^a(n) 
are well known (David 1970, chap. 1-3 or Kendall and Stuart 1977, pp. 251-252). 
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Let Fy^{n)(y) the cumulative distribution function of the quantile estimator. 
Then fy^(n)(^/) be written as 

^y.(n)(y) = Pr{^a(ra)<»/} 

= Pr { at least r of the n Y, are < y] 

= E('’)f'i'(y)(i-f’i'(!/)r'. (3) 

i=r 



since the term in the summand is the binomial probability that exactly i of the Y, 
are less than or equal to y. If the Y, are continuous with a density function fy{y), 
the density function of is 

fMn){y) = nr ^ - ^Y{y)r^ friy) 

L>(r,n-r + lJ 

where represents the complete beta function. Unfortunately, while yai^) is a 

nonparametric estimator, (3) shows that the distribution of the quantile estimator 
yc^(n) depends not only on n and a but also on the unknown distribution of the 
underlying Y. 

The bias and variance of ya{'^) also depend on n, a, and the distribution of the 
underlying Y. Assume that is continuous with a density function fy{y) which 

is differentiable and nonzero at y^. The following result for the expected value of 
the quantile estimator can be derived from results in David (1970, p. 65): 



^[ya(^)] = ya- 



e 

n/viya) 



«(1 - q) fyiyg) , q/J_ 
2(71 + 2 ) / 2 ( 2 /,) ^ 



(4) 



where e is a sawtooth function of n and a such that |e| < 1 and /'(•) denotes 
the derivative of the function /(•). An expansion for the variance of the quantile 
estimator can be derived in similar fashion as 

varfc(n)| = <7Lw = (;p^^i^^ + 0(i). (5) 

The notation g{n) = 0(l/n^) means that the absolute value of ^(7i)/(l/n^) remains 
bounded as n goes to infinity. 

There are also well known asymptotic results for ya{^) (Serfling, 1980, sec. 2.3). 



• If ya is the unique solution y of F{y-) < a < F{y), then ^a(^0 — 
probability 1 as n — ^ oo. 






If Fy{y) possesses a density fy{y) in a neighborhood of and fy{y) is pos- 
itive and continuous at ya, then jjai^) has an asymptotic normal distribution 
in that 



n<.(n)(2/)~N 




/ a(l - O') 




as n — • oo. 
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• Weiss (1964) proved that under mild conditions, the sample marginal quantiles 
from a multivariate population with an absolutely continuous joint distribution 
function have an asymptotic multivariate normal distribution. The asymptotic 
covariance is a function of the multivariate distribution of the underlying mul- 
tivariate population. This multivariate result is important because of the role 
of the joint distribution of the controlled and controlling statistics in the theory 
of controls for variance reduction. 

2.2 Using Sectioning to Estimate the Variance of a Quantile Es- 
timator 

When using (2) to calculate a point estimate of the a quantile, one must also 
estimate the variance or equivalently the standard deviation of the point estimate. 
One could estimate the density of Y at ya and use (5) to estimate the variance. 
However, the instability of density estimates at extreme quantiles can cause this to 
be a very biased and unstable estimate of the variance of A more general 

technique is to use sectioning to calculate both a point estimate of the quantile 
and an estimate of the variance of the point estimate. While non-parametric con- 
fidence intervals are available for crude quantile estimates (see Mood Graybill and 
Does 1974, p. 312), the confidence intervals are not appropriate for controlled esti- 
mates. A brief discussion of sectioning follows; for a detailed discussion of sectioning 
see Lewis and Orav (1989, chap. 9). 

Let the random variable yot{n) be the function of independent and identically 
distributed random variables yi,...,Yn defined in (2) such that ^a(^) is a point 
estimator of ya^ Let denote the variance of ya(n). Assume for now that there 

are a total of iV = m X n independent samples of T, namely Yi, . . Y^, . . . , Yyv- The 
sectioned point estimator, ya{m^n)^ is constructed as follows: 

1. Divide the N samples of the random variable Y into m sections with n samples 
each where for simplicity n X m — N (equivalently, replicate a sample of size n, 
m times). 

2. For the jth section, j = 1, . . . , m, use (2) to compute j(n). 

3. Compute ya(^^^) ^s: 



ya{m,n) = — 



( 6 ) 



j=i 



The point estimator is a sample mean of m independent estimates, 

each of which is based on n samples. 

4. Estimate the variance of ^a(^,^), namely al-. with the sample variance 

J/o ^ m. ,71 ^ 



of the sample mean: 



^ j = l 



(7) 
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One advantage of sectioning to estimate the variance of the quantile estimate 
over estimating the density is that since the yQ^j{n) in step 2 above are i.i.d. and 
the point estimator yaivn^n) is their sample mean, S'k-. . is an unbiased estimate 

2/q (m,n) 

of the variance of the point estimate. Furthermore, if the ya,j{^) are approximately 
normally distributed, one can develop approximate confidence intervals for ya^Trt.n) 
based on a ^statistic with m — 1 degrees of freedom. A disadvantage of sectioning 
is the increase in the bias of the point estimate; the first-order bias predicted by (4) 
for is m times that for yct{N)^ a point estimate based on all N samples. 

For fixed N ^ the selection of m and n involves a tradeoff between the bias and 
the variance of yc»(m,n)as well as the precision of the estimate of the variance 
of ya(^? To minimize the bias in ya(m, n), as well as improve the approximation 
to normality of the individual one would like n to be large. A drawback of 

increasing n is the decrease in precision of the estimate of the variance of the point 
estimate as well as a decrease in the degrees of freedom, m — 1, for the ^statistic, 
which relaxes the confidence interval. Using (5) and (7), one can write the expansion 
for the variance of the sectioned estimate in terms of m only as 

^ 2 _ = ^ 
yo(m,n) ^ ( jV + 2m) \N'^ 

where /? and 7 are constants determined by Fyiy) and a. The presence of m in 
both the denominator and the numerator in (8) implies, for fixed A', that the value 
of m which minimizes the variance is a function of the relative magnitudes of j3 
and 7. If /? is small relative to 7, one should choose a small m in order to minimize 
the variance. The value for m must be at least 2 in order to use (7) to estimate 
the variance. Values for m and n which will minimize the variance or the mean 
square error of the point estimate can be determined as functions of terms such 
as /? and 7. However, these terms are in turn functions of the distribution of Y 
which is unknown. After consideration of the above, Lewis and Orav (1989, p. 262) 
suggest as a ‘‘rough rule of thumb” to make m between 12 and 20 for samples with 
N over 1000. This usually gives sufficient precision for the estimate of the variance 
of ya{m,n). 

Once m and n have been selected, the variance of the point estimate can be 
estimated. Equation (5) shows that is a decreasing function of n. For fixed 

m, a decrease in will cause a corresponding decrease in A technique 

for reducing without increasing n is linear controls. 

3 LINEAR CONTROL OF QUANTILES 

3.1 Single and Multiple Linear Controls 

3.1.1 A Single Linear Control 

Linear controls is a variance reduction technique which can be used to reduce the 
variance of an estimate of a statistic of interest, often a sample mean. The statistic 
of interest in this paper is the quantile estimator ya{^) from (2) and eventually the 
individual section estimate yQj{n) from (6). 
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To use a linear control for variamce reduction a random variable generated in 
the simulation, called the control or control variable, which is correlated with yot{n)^ 
must be available. The expected value of the control must be known, either exactly 
or approximately. Let C be a random variable which is generated via simulation. 
Although an estimator of the a quantile of C is not necessarily the most effective 
control for a given quantile or T , for purposes of discussion we will use as the control 
variable the estimator of the a quantile of C as defined in (2), namely Ca(n). The 
random variable Ca{n) is a function of n i.i.d. samples of the random variable C. 
If Ca(n) is generated as part of the simulation that produces the samples of Y it 
will be called an internal control variable. If Ca(n) is generated as output from a 
different simulation, it will be called an external control variable. 

The linear control scheme for variance reduction, with a single control, uses as a 
control function a linear additive combination of the control and its expected value to 
produce a controlled estimate ^(n) where the prime applied to an estimate implies 
that it is a controlled estimate. The control function, with coefficient 0, is subtracted 
off from the uncontrolled or crude estimate ya{n) to produce the controlled estimate 
as follows: 

= Vain) - d {Ca(n) - E[co,(n)]} . (9) 

Putting aside the question of sectioning for now, the purpose of using a control 
is to minimize the variance of the controlled estimate, a?, for a fixed sample 
size n. If the statistic of interest is ya,j{'^) from (6), minimizing its variance will, 
for fixed m, minimize the variance of the section estimate ya{'^^'^)* The value of 6 
which minimizes a?, can be determined using differentiation to be the regression 
coefficient from the regression of ya^n) on Ca{n); 

n ^ya(n),Ca(n) ^ya{n) ^ \ - ( \\ /in\ 

e = y ^ = —pKyaKn),Co\n)) (10) 

^Co(n) ^co(n) 

where <^y^(n),co(n) the covariance of yQ{n) and Ca{n) and />(ya(Ti), Cc^(n)) is the 
correlation between ya^n) and Ca{n), 

3.1,2 Multiple Linear Controls 

One can use multiple controls for variance reduction where Ca{n) and 0 become 
p-dimensional column vectors, c^[n) and with components and 0,-, for 

i — 1, . . . ,p. With multiple controls, equation (9) becomes 

-E[c„(n)]}. (11) 

It can be shown (see Kendall and Stuart, 1977, chap. 27) that in the multiple control 
case, the values for 6_ which minimize a?, are the multiple regression coefficients 

(^c^(n)) ‘^ya(n),c„(n) (12) 

where is the covariance matrix of ^(n) and <^j;a(n),c<,(n) the p-dimensional 

vector with components cov(p„(n),Co,,(n)), for i = l,...,p. 
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Rubinstein and Marcus (1985) demonstrated that the solution for £ in the linear 
control of a single response, yain), is a special case of determining the canonical 
correlation coefficients for maximizing the correlation between linear combinations 
of multiple responses and multiple controls. 

3.2 A Measure of the Effectiveness of a Control for Variance Re- 
duction 

One measure of effectiveness for a particular linear control is the percent variance 
reduction which involves the ratio of the variance of the controlled estimate y^(n) 
to the uncontrolled estimate yain), A high percent variance reduction implies that 
the control is effective at reducing the variance of the point estimate. For a single 
control, assuming the optimal value for 6 is known, the percent variance reduction 
is 

^2 

1 = p‘2 (^y^(^n)^ca(n)) . (13) 

Equation ( 13) implies that for the control to be effective, one should choose a random 
variable which is “strongly” correlated with yain) to be the control variable Ca{n). 
For multiple controls, the percent variance reduction is the direct generalization 

1 _ _ p2 

ya(n) 

where 

^ya(n),c^(n) - 2 

^yo(n) 

is the square of the multiple correlation coefficient between yain) and As 

before, the effectiveness of the control depends upon a large value for e (n)* 

When the number of multiple controls to use is given, one should simply choose 
those controls which maximize the i?? However, determining the number 

of multiple controls to use is a more difficult problem which is complicated by the 
necessity of estimating the coefficients in 6. 

3.3 Use of the Asymptotic Expected Value as an Approximation 
for the Expected Value of the Control 

When using a linear control for variance reduction, the expected value of the 
control is subtracted from the control variable in the control function as in (9) so 
that the control function will have a mean of zero. A mean-zero control function is 
desirable when controlling an unbiased estimator such as a sample mean so that the 
controlled estimate is also unbiased. However, expected values of quantile estimators 
are rarely known exactly. If the values of the density function of C and its derivative 
at Ca are known, the biased expected value of the quantile estimator from (4) can 
be subtracted in the control function so that the control function does not affect 
the first order bias in the controlled quantile estimate. If the expected value of the 
biased quantile estimator is not known, it can be approximated by the asymptotic 
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expected value of the estimator; i.e. the actual quantile value Cq. The value Cq 
will replace E[ca(n)] in the control function in (9). While this causes the control 
function to have order 1/n bias, there is already order 1/n bias in the estimate being 
controlled, so that the order of the bias in the controlled estimate is the same 

as in the uncontrolled estimate. 

Even when the biased expected value for the control from (4) is known, it may 
desirable to use the asymptotic value. There is empirical evidence, and it can 
be shown analytically, that use of a control function with order 1/n bias can ac> 
tually decrease the magnitude of the first-order bias in the controlled estimate. 
For example, let ^y^(n) denote the first order bias of iiain) computed using (4) as 
^ya(n) = E(ya(n)] - ya + 0(l/n^) and let ^ca(n) denote the bias of Ca(n) computed 
similarly. If using the linear control scheme (9) to control a quantile estimate, where 
^!/o(n)/^co(n) IS positive and 

0<e<2^^, 

^Ca(n) 

the magnitude of the first-order bias of the controlled estimate is less than the 
magnitude of the first-order bias of the uncontrolled estimate. 

If we are using sectioning to generate the overall point estimate and an estimate 
of the variance (standard deviation) of the point estimate, and we assume that 9 is 
known, equations (6) and (7) can be combined with the linear control equation, (9), 
to get 

I 

(15) 

^j=i 

= (16) 
^j=l 



with an unbiased estimate of the variance of the controlled estimate of 

1 



I 









(17) 



These results are straightforward. It is when 0 is not known, the usual case, and 
has to be estimated using sectioning, that estimating the variance of the controlled 
estimate requires some care. 

3.4 Estimating the Coefficients 

In the usual case in simulation, the values for 9 or 0 must be estimated since 
not enough information is known about the joint distribution of ya(n) and c^(n) to 
determine the regression coefficients. For notation’s sake, assume that one is using 
a single control. If using sectioning to estimate the point estimate along with its 
variance, the sectioned estimates yj{n) and Cj(n), for 7 = are available 

to use to estimate 0. One could generate sample estimates of the variance and 
covariances in (10) to estimate 9\ however since 0 is the coefficient of regression, an 
equivalent but computationally more convenient method for estimating 0 is to use 
linear least-squares regression. 
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The regression coefficient 0 can be estimated by the least squares regression of 
[yc,j(n) — ya{m,n)] on 6[ca,j{n) — Ca] using the regression model 

[yQ, 3 {n) - ya{rn,n)] = 6[ccj{n) - Ca] + €j, j = l,...,rn (18) 

where the are considered fixed and Cj is a mean-zero random variable inde- 
pendent of Ca^j{n), Denote by the estimate of 6 from a regression which 

used m estimates for both the dependent variable and the predictor variable, where 
each of the estimates was based on n independent samples of F or C as appropriate. 

Once 0(m,n) is computed, the controlled estimate for each section can be com- 
puted using (9) as 



- S{m,n) {c«,j(n) - c«) . (19) 

where Cq is the approximation for the expected value of the control. The final 
controlled section estimate, y^(m,n), can be computed using (15) as the sample 
mean of the controlled estimates from each section. Unfortunately, estimating the 
variance of the with (17) is not as straightforward since the individual 

y^ j(n) are generally no longer independent because of the common 0(m,n). The 
characteristics of the quantile estimates and the variance estimates depend upon the 
joint distribution of yai'fi) and Ca{n). 

3.4.1 Subtleties with the Joint Distribution of the Estimators 

A key point of linear controls for quantile estimates is that the joint distribution 
of the statistic being controlled and the control statistic, here ya('n) and Ca{n), is 
of primary importance for determining 6 and the characteristics of the controlled 
estimate, not the joint distribution of the underlying populations Y and C, 

This is in contrast to the use of a linear control for controlling an estimate of the 
mean, y, with the sample mean of the control, c. In this case, one can determine 6 
as a function of the joint distribution of Y and C since, using (10), 

^ _ cov(y, c) _ cov(r/,c) 
var[c] var[c] 

Although the joint distribution of y and c is different from the joint distribution of 
Y and C, one can estimate 6 using estimates of the population covariances based 
on the N individual samples. In general, when controlling estimators other than 
the sample mean, one must estimate the covariances from the joint distribution of 
the controlled statistic and the control, not the joint distribution of the underlying 
populations. 

3.4.2 Sectioning with the Assumption that the Joint Distribution is 
Multivariate Normal 

If the joint distribution of ya{n) and Ca{n) is multivariate normal and 0 is estimated, 
the point estimate of the quantile and the estimate of the variance of the point 
estimate have several nice properties: 
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• the controlled estimates for each section, j(n), are i.i.d. since the sample 
covariance matrix of the Ca^j{n) is independent of their sample mean. 

• V estimate of the variance of (17) where An) is 

computed using (19), is an unbiased estimator, and 

• one can develop an unconditional confidence interval for yo(m,n) using the 
t statistic following Lavenberg, Moeller and Welch (1982) since conditionally 
unbiased estimators remain unbiased unconditionally and conditional confi- 
dence intervals remain valid unconditionally (see Kendall and Stuart, 1977, 
p. 379). 

When the multivariate normal assumption is not valid, 

• the controlled estimates from each section are no longer independent 

since the sample mean and covariance matrix are no longer independent. The 
controlled estimates also have additional 0(1 /m) bias from the estimation of 0, 

• . from (17) can still be used to estimate the variance of ^(m,n) al- 

y'o,(m,n) 

though it is now biased, and 

• even if the normally distributed, a confidence interval based on 

a t statistic is only approximate because of the lack of independence of the 
individual section estimates. 

One method for maintaining independence between the controlled section estimates 
at the cost of a loss of variance reduction is to estimate 6 independently for each 
section. 



3.4.3 Subsectioning 

An alternative to estimating a single ^(m,n), which couples the Vajin) together so 
that they are no longer independent, is to generate an individual estimate of 0 for 
each section. This can be done by subsectioning the n samples within the section 
and calculating quantile estimates within the section to use as data to estimate 
0j{vA). More formally, for each jth section, for j = l,...,m, 

1. divide the n samples into v subsections of length / where v x I ^ and 

2. estimate ya^j^kil) sind subsection, for k = 1,. . . ,u. 

3. Use the v sets of subsection estimates ya^j^ki^) a-nd from the jth section 

to estimate 0j{vJ) using a regression model similar to (18). 

Once 0j{vA) has been estimated, the controlled estimate for the jth section is com- 
puted as 

y'a,j(n) = yo,j - 0j(u,/)(Caj(n) - Ca). (20) 

The equation is similar to (19) only now there is a subscript on 0, which also has 
different arguments. The final controlled estimate is calculated as before, as a sample 
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mean using (15), and the estimate of variance of the point estimates is calculated 
using (17). 

An advantage of subsectioning is that by using an independent estimate of 0 to 
calculate each section’s controlled estimate, the j{n) are now i.i.d.. A disadvan- 
tage of using subsectioning is the loss of predicted variance reduction. This occurs 
for two reasons. The first is that instead of needing one estimate of 0, now m esti- 
mates are needed and each additional estimate tends to reduce the achieved percent 
variance reduction. The second reason is that 0{v^l) is not an unbiased estimator 
of the regression coefficient for ya(^0 ^a(^) since it is calculated using quantile 

estimates based on I samples, which have a different joint distribution than ya{n) 
and Co,(n). There can also be some additional bias in the y'^ ^(n) from the estimation 
otOj, 

3,4.4 Splitting and The Jackknife 

Other methods which have been used with linear controls for calculating a point 
estimate and the variance of the point estimate include splitting and the Jackknife. 
Each of these techniques is described in Lewis and Orav (1989, chap. 9) and in 
Nelson (1988). 

The splitting technique removes the bias caused by estimating 6 with the same 
data being controlled at the cost of reducing the percent variance reduction. Split- 
ting has been described in Tocher (1963, p. 115) and then in Beale (1985). When 
using sectioning to generate m individual section quantile estimates f/Qj(n) and 
Caj{n)^ for j — 1, . . . ,m, the splitting procedure generates an estimate of 6 for each 
section. The estimate of 0 for the jth section is computed using all of the section 
estimates except the jth set of estimates. The controlled estimate for each section 
is computed using (20) with 0j{m — l,n). The final controlled estimate and its 
variance are computed as before as the sample mean of the individual controlled 
section estimates and the sample variance of the sample mean 

The splitting estimator eliminates the bias in y^^j(n) due to estimating 0. How- 
ever, like the sectioning estimator it has the disadvantage that the ^^j(^) are no 
longer independent. It also has the same disadvantage as the subsection estimator 
in that m estimates of 6 must be computed, reducing the percent variance reduc- 
tion. The primary purpose for using the splitting estimator has been to eliminate 
the 0(l/m) bias in the controlled estimate from the estimation of d in non- normal 
samples when controlling unbiased estimators. Since the quantile estimator already 
has 0(l/n) bias, which is unaffected by splitting, and splitting has no other clear 
advantages over the section or subsection estimator, we chose not to use it. 

Jackknifing is a method for removing the 0(l/n) bias in yo(n) at the price of 
uncertainty about the loss of percent of variance reduction in small to medium sized 
samples. For an “m-fold”jackknife estimate, one combines an estimate based on the 
entire data set, ya,o(^)» with m estimates, each based on the data set with iV/m 
samples deleted, ^aj(A^ - ^), for j - l,...,m, to get a set of m ‘pseudo values” 
( 3 )i)a{N — m)^iox j — 1, . . . , m. The final jackknife point estimate is the sample mean 
of the pseudo values. In some circumstances, one can also use the sample variance of 
the sample mean of the pseudo values as an estimate of the variance of the jackknife 
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point estimate. 

The jackknife estimate has an advantage over the section and subsection esti- 
mators in that the bias of the quantile estimates is reduced since each pseudo value 
is based on estimates using N — m instead of A^/m samples. Unfortunately it has 
some disadvantages as well. Lavenberg, Moeller and Welch (1982) examined the use 
of the jackknife when using a linear control for the sample mean under the assump- 
tion of a multivariate normal distribution between the statistic of interest and the 
control. They found that the jackknifed confidence interval was usually larger and 
more computationally expensive than the standard linear control based confidence 
interval. Nelson (1988) compared the performance of several methods for linear 
control of the mean when the normality assumption wzis violated and found that 
the jackknife was usually “dominated” by the splitting estimator. 

The jackknife has been used in quantile estimation. Seila (1982) used a 2-fold 
jackknife for removing the bias of quantile estimates however he used a sectioning 
approach for estimating the variance of the point estimate, not the jackknife estimate 
for the variance of the point estimate. Miller (1974), and Efron and Gong (1983) 
imply that the jackknife technique may not be an appropriate tool for use with 
quantile estimation because of the discontinuous, nonlinear nature of quantile esti- 
mators such as (2). Our empirical results (presented in the last section) confirmed 
that the jackknife was not suitable for computing quantile estimates and estimates 
of the variance of the jackknife point estimate because of the high variability of the 
point estimates and the poor performance of the jackknife estimate of the variance 
of the jackknife point estimator. 

3.5 The Loss Factor 

In general, regardless of the method chosen, estimating the coefficients can cause 
a reduction in the percent variance reduction predicted by (13) or (14). Lavenberg, 
Moeller and Welch (1982) investigated the decrease in predicted variance reduction 
caused by using the individual samples to estimate 0 for a linear control of the sam- 
ple mean. Under the assumption of multivariate normality between the statistic of 
interest and the control, they concluded that the decrease in variance reduction due 
to estimating 0 could be predicted by multiplying the R^{') in (14) by a “loss factor”. 
The loss factor was (m — 2)/(m — p — 2) where m was the number of independent 
samples of the statistic being controlled and p was the number of controls whose co- 
efficients had to be estimated. The loss factor is a deterrent to adding more controls 
simply to achieve a small increase in the in (14). As one selects more controls 
for a multiple control scheme, the impact of the loss factor can quickly overcome 
the benefits of increasing the R^, Thus one can not guarantee an improvement in 
the effectiveness of a linear control by simply adding more controls. 

3.6 Measuring the EflFectiveness of a Control at Reducing Sample 
Sizes 

Lewis and Orav (1989, p. 262) mention an alternative measure for quantifying 
the effectiveness of a control scheme. They look at the square root of the ratio of 
the variance of the uncontrolled estimate to the variance of the controlled estimate. 
This ratio can be considered to be the ratio of the sample size that would be needed 
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to achieve a given standard deviation without using the control scheme, to the 
sample size needed to achieve the same standard deviation using the control. When 
expressed in terms of the correlation coefficient for the controlled statistic and the 
control, the ratio becomes 1/(1 - Given a value for /?{•), the formula gives 

the increase in the sample size that would be needed to achieve the same standard 
deviation without the control. Given a desired reduction in sample size, say 1/2, the 
formula implies that to achieve a given standard deviation while cutting the sample 
size in half, one must have 1 — = .25, which implies a correlation coefficient 

of ±0.86. 

Linear controls are typically unable to reduce the sample size by as much as a 
half because the correlation between the statistic of interest and a linear function 
of the control variables is not high enough. Since many statistics have a nonlinear 
relationship with the control variables, one possible means for increasing the variance 
reduction for a given set of controls is to allow nonlinear transformations of the 
controls. 

4 NONLINEAR CONTROLS 

4.1 Definition of a Nonlinear Control 

One can generalize the linear control scheme for p controls, (11), to include 
nonlinear transformations of random variables as controls for variance reduction 
as shown in Lewis, Ressler and Wood (1989). Let /i:(cc^,i(n), ), for z= l,...,p, 
be a transformation function of the random variable Co,i(n) and let 6^ be a vector 
of coefficients where, depending upon /i,(-), the vector 6^ may have more than one 
component. When incorporating nonlinear transformations of multiple controls, the 
linear control scheme (11) becomes 

yM = Un)-H{c^{n),e) ( 21 ) 

where for our purposes H(-) is a linear additive combination of the p transformed 
controls, and their expected values, E[/i,(^<^,,(n), for i = l,...,p. 

The vector 9 contains the coefficients from the linear combination in addition to the 
p sets of coefficients from the individual transformations. II {^{n),9) will be referred 
to as the control function. A control function with terms that are nonlinear in the 
unknown coefficients will be said to be a nonlinear control. For ease of notation, 
the coefficients 9 may be suppressed in the expressions for //(•) and /i(-). When 
there is only one control so that p — I, the subscript i will be suppressed so that 

M-) = 

In some simulations possible control variables may have very low correlation 
with yai'fi)- For a given control, two of the possible sources for the low correlation 
between iiain) and Cct{n) are: 

1. there is in fact very little structural relationship between ya{^) ^nd the control; 
i.e. a bivariate scatter plot of ^a(n) versus Ccr(u) would look patternless, or 

2. the structural relationship between yain) and c^(n) is of a nonlinear form 
which is poorly approximated by a straight line. 



13 



In the first case, a nonlinear control may or may not offer improvement over the linear 
control. In the second case, a nonlinear control can offer substantial improvement 
in variance reduction, as shown in Lewis, Ressler and Wood (1989). 

A simple example will show the potential benefits of nonlinear transformations. 
Let - 2 : be a Normal (0,1) random variable which is being used to control the sample 
mean of w = It follows that 



COv(lt?, ^) = E[z^] “ = 0 



so that p{w^ z) is zero, which implies zero effectiveness for the linear control as well. 
Now allow the nonlinear transformation 



h\z) = h{z,9) = z^ 



with 0 = 2. The transformed random variable h*{z) is a Xi random variable with 
mean 1 and variance 2. It follows that 

2 

cov(^y, /z’"(z)) = var[z^] = 2 => p(w,h*‘{z)) = ~ = 1 

so that the nonlinear control is completely effective. Therefore when evaluating a 
potential control, one should ask: Can this random variable be transformed to have 
a ^^high^ correlation with the statistic of interest? 

4.2 The Existence of Optimal Nonlinear Transformations 

For some random variables, transformations do exist which will improve their 
correlation with i/o(n). 

• Let ya{n) and c^{n)^ with p components Co^i(n), for i = l,...,p, be random 
variables with a general but nonsingular joint distribution. 

• Let g(ya{n)) = g{ya(n),(f>) and hi{ca,i{n)) = /i(ca,,(n), 0-), for i= 

be mean-zero transformation functions of random variables ya{n) and Co,t(n) 
such that var[^(yo(^))) = 1 sind var[/i,(ca,t(n))] < oo, for i = 

Breiman and Friedman (1985) proved the existence of optimal transformations for 
maximizing the correlation between giyain)) and H(^(n))^ a linear additive func- 
tion of the mean-zero /ii(cQ,t(n)). The optimal transformation for one variable can 
be expressed in terms of the conditional expected values of given transformations of 
the other variables. In the bivariate case, where /T(-) = /i(-) since p = 1, the pair of 
optimal transformations p*(-) and h*{’) are: 



9'iyain)) 



E[/i*(co(n)) I t/a(n)] 
l|E[h’(c„(n)) I 2 /a(n)]|| 



h‘(cc(n)) = E[g'(yc,(n)) | Ca(n)) 
where 11-11= {E[(•)2]}'/^ 
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In the multiple control case, where p > 1, 



9’{ya(n)) 



E 


P 

Y^h'(cc,,(n)) 1 j/o,(n) 
-i = l 


E 


^/i'(Cc..(ra)) 1 yc{n) 

.i=l J 



( 22 ) 



and 






E 



9{ya{n))-Y^h^^(Ca^j{n)) . 



(23) 



The transformations ^*(-) and in (22) and (23) will usually be nonlinear, the 
exception being when ya{n) and 4>(^) ^ multivariate normal distribution. 

Results from Lancaster (1966) can be used to show that if ya{n) and c^{n) have 
a multivariate normal distribution, the solutions for ^(^a(^)) a^nd H{^{n)) which 
have maximal correlation between giifai'^)) H(c^{n)), over all measurable func- 
tions of finite variance, are the linear transformations which yield the first Hotelling 
canonical variables. In other words, when yo(n) and c^(n) have a multivariate nor- 
mal distribution, using the linear control scheme (11), with the multiple regression 
coefficients for 6, produces the greatest amount of variance reduction. Conversely, 
whenever the joint distribution of yoi{n) and c^{n) is not multivariate normal, a 
nonlinear control offers the possibility for greater variance reduction over a linear 
control. 



4.3 Estimating the Optimal Nonlinear Transformations 

Determining the optimal transformations in (22) and (23) analytically requires 
the joint distribution of ya(^) ^nd c^{n) which, in the context of a simulation, is 
unknown. In the multivariate normal case, the form of the transformations are 
known to be linear and one can estimate the coefficients using one of the methods 
described earlier. With a nonlinear control, one must first estimate the form of the 
transformations. 

Breiman and Friedman (1985) also developed the Alternating Conditional Ex- 
pectation Algorithm (ACE) as a means for generating iionparametric estimates of 
the optimal transformations (22) and (23). In the ACE implementation for finite 
data sets of continuous variables, data smooths are used in place of the analytical 
conditional expected values. The ACE algorithm produces estimates of the optimal 
transformations as sets of fitted values, one set for each variable. Plotting the fitted 
values against the original values gives the shape of the estimated transformation for 
each variable. ACE also provides an estimate of the maximum obtainable squared 
correlation between the transformed response and the sum of transformed predic- 
tors. This R^ estimate is useful as it provides an estimate of an upper bound on the 
percent variance reduction one can obtain using the given set of controls. 

Since ACE does not give an explicit analytical form for its estimate of the optimal 
transformation, one must approximate the optimal transformation with a parametric 
nonlinear transformation. The output from ACE is useful in selecting an appropriate 
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approximating transformation. One possible approximating transformation is the 
scaled power transformation 



h{ca{n),e) = 



(Cg(n) - 1) 

e 



for 0 > —1, 



(24) 



where 0 is an unknown parameter which becomes a coefficient which must be esti- 
mated. Using this transformation, the nonlinear control scheme (21) can become 



HM = Pain) 



fc"- 



c®2(n) - 1 



O 2 



- E 



- 1 

02 



(25) 



where both 0\ and 02 need to be estimated. Other possible transformations are 
described in Lewis, Ressler and Wood (1989). 

As a general rule, a transformation should contain the linear transformation as a 
special set of parameter values 0^, This allows for the linear control to be a special 
case of the nonlinear control when the joint distribution between the statistic of 
interest and the control is multivariate normal. Choosing the special set of param- 
eter values 01 as starting values for the nonlinear optimizer which estimates the 
coefficients initializes the optimizer at the linear control. Any movement made by 
the optimizer away from the starting values implies that the nonlinear control is 
giving improved variance reduction over the linear control. Thus using a nonlinear 
control, one can not do worse than using a linear control. 

One of the problems in choosing an approximating transformation /it(co,,(n), 0) 
is that E[/i,(co,,(n), 0)] must be known exactly or approximately. This severely limits 
the selection of nonlinear transformations available to approximate /ii|'(ca,,(n)) as the 
necessary expected values may be intractable or unknown for some transformations. 
The difficulty in analytically determining the expected value of the transformed 
control can be greatly reduced when using monotone transformations of quantile 
estimators as controls, as is discussed in the next section. 



5 NONLINEAR CONTROL OF QUANTILE ESTI- 
MATES 

5.1 The Behavior of Quantiles Under Monotone Transformations 

Quantiles have a property that is especially useful when working with nonlin- 
ear controls. Under strictly monotone transformations of the underlying random 
variable, the quantiles transform monotonely as well. For example, 

• let /i(-) be a strictly monotone function with inverse /i“^(-), 

• let C be a random variable with a continuous, strictly monotone cumulative 
distribution function such that for all a between zero and one, Fj^(a) = Cor, 
and 

• let W = h{C) be the transformed random variable. 
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By definition of a quantile, 



Pr{C < Ca} = ca and Pr{M^ < JVct} = a. 



Therefore: 



Pr{VP < Wa) = Pr{/i(C) < u/a} 

= Pr{c</i-*(t0«)} =a. 

This implies that for all a between zero and one, 

Wa = /l(Ca). (26) 

For example, if C has a uniform (0,1) distribution with .9 quantile of c.g = .9, then 
the .9 quantile o{ W — h{C) = namely n ;.9 is equal to = .9^ = .81. 

The key point is that the a quantile of a transformed random variable can be 
found by applying the same transformation to the a quantile of the original random 
variable. 

5.2 Controlling Quantile Estimates 

The fact that quantiles transform monotonely under strictly monotone trans- 
formations of the underlying random variable can also be useful in computing the 
expected value of a transformed quantile estimator. It is important to note that 
the random variable being transformed is the quantile estimator Ca{n) and not the 
underlying C. For a given nonlinear transformation, it may be possible to compute 
the expected value of h{Ca{n)), For example, if C has a uniform (0,1) distribution, 
and h{ca{n)) is the scaled power transformation, (24) where 0 is constrained to be 
non-negative, h{ca{n)) has a Beta distribution with a known expected value. For 
other distributions of Co(n), or other transformations /i(-), the expected value may 
not be tractable. This is where the use of strictly monotone transformations can 
help. 

We are interested in the expected value of the transformed quantile estimator. 
When a strictly monotone transformation is applied to the underlying C, the quan- 
tile estimator Ca(n) transforms monotonely as well, i.e. if Ca{n) estimates Cq and 
h{C) = W, with a quantile then 

Wc,{n) = ft(ca(n)). 

From the point of view of the quantile estimator, applying a strictly monotone 
transformation to a quantile estimator, Cc^(n), yields the same estimate as using the 
identical transformation on the underlying random variable C and then using (2) to 
estimate the a quantile. Although for small n 

E[h{c,{n))]:/^h{E[cM]y 



it is true that as n — ► oo, 

E[h{cQ{n))] — ^ /i(Cq) and h{E[ca{n)]) — - h{ca) 
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so that asymptotically, the expected value of the transformed quantile estimator 
is the same as the expected value of the quantile estimator of the transformed 
underlying random variable. 

Since the asymptotic expected values are the same, if the individual transforma- 
tion functions /i(-) in the control function are restricted to strictly mono- 

tone transformations, one can approximate E[/i(cc^(n), 0)] in the nonlinear control 
function ^(Ccr(n),^), with the asymptotic expected value of the transformed con- 
trol, namely, the transformed value of the a quantile, h{Ca^Q), Calculating h{ca^Q) 
is trivial since is a constant. Using the asymptotic expected value with the scaled 
power transformation, the nonlinear control scheme becomes 



yK”) = Vain) - 6i 



Cp(n)^^ - 1 

O2 




The use of the approximation introduces bias into the control function, but it is 
still 0(l/n) and may, as in the linear control case, reduce the magnitude of the first 
order bias of the controlled estimate. The key point is that the analytical burden of 
calculating the expected value of the transformed control has been greatly reduced. 

Once the approximating transformations for the ta have been selected, one can 
use either the section or subsection estimator to estimate 9 and calculate the final, 
controlled point estimate in (15) and an estimate of the variance of the 

point estimate. Regardless of the method, the coefficients in 6 for h{c^,0) can 
be estimated using a nonlinear least-squares regression algorithm as the nonlinear 
optimizer. 

5.3 Selection of m and n for a Nonlinearly Controlled Section 
Estimate when 9 Must be Estimated 

A major factor that must also be considered in the selection of m and n for 
fixed sample size N is the impact of n, the number of samples used to compute 
the individual quantile estimates, on the joint normality of the quantile estimates. 
When computing a controlled section estimate and estimating the coefficients 0, the 
impact of m and n on the variance of the estimate ^(m,n) must also be considered. 

As previously discussed, given a fixed sample size N the values of m and n 
which minimize the mean square error of the crude section estimate are a function 
of the coefficients in the asymptotic expansions for the mean and variance of the 
estimator, equations (4) and (5). The variance of the controlled estimate y^in) 
is a function of the variance of the estimate of the coefficients 0 in addition to 
the variance of the crude estimate, yain)., and the variance of the estimate of the 
control Ccr(n). In general, the bias and variance of coefficients estimated via least- 
squares nonlinear regression is a decreasing function of the number of estimates 
used as data in the regression (see Gallant, 1987, chap. 1). When using the section 
estimator, this implies that one would like m, the number of quantile estimates, 
to be large. However, as m increases for fixed A, n must decrease, increasing the 
bias and variance of the estimates used as data in the regression. If n is too small, 
the bias and variance of the estimates could be such that there is actually very 
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little nonlinear or even linear relationship between the crude and control quantile 
estimates so that any control scheme is ineffective. 

If n, the number of samples in a section, is too large, the joint distribution of 
the crude and control quantile estimates approaches a joint normal distribution as 
seen in part 2.1. The impact of the joint normality is that the optimal nonlinear 
transformation is now the linear transformation of the linear control as seen in 
part 4.2 and one has lost the increased effectiveness of the nonlinear control. This 
result is similar to one obtained by Glynn and Whitt (1989) who state that ‘‘no 
improvement in asymptotic efficiency can be achieved by generalizing the notion 
of control variables from a linear form to a nonlinear setting.’’ They go on to 
say however, “...this does not preclude the possibility of better performance by 
nonlinear methods in a small sample context.” The key point is that by avoiding 
the asymptotic joint normality through keeping small the number of samples used 
to compute the individual quantile estimates, the nonlinear controls can be more 
effective than the asymptotic linear controls. 

When using the subsection estimator, the interplay between m and n changes. 
One must now consider the impact of choices for u, the number of subsection esti- 
mates, and /, the number of samples used to compute a subsection estimate. With 
the section estimator one wanted m, as the number of points in the regression, to be 
large. For the subsection estimator m is the number of estimates of 0 to compute 
and a large m implies more regression computations that have to be made, as well 
as a small value for n. For any given value of n, the choice of v and / has slightly 
different considerations than the choice of m and n for the section estimator. An 
important consideration for the subsection estimator is that / be “close” to n so 
that the joint distribution J/o(/) and Cq{1) will be similar in shape to that of ya(^) 
and Ccy(n). If the two joint distributions are not similar in shape, then the subsec- 
tion estimate of 9 could be very biased, reducing the effectiveness of the control. 
This suggests making v as small as possible while still being two to three times the 
number of coefficients being estimated. If n is too small, the few samples available 
for the V subsections of length / will force both v and / to be small, resulting in 
possibly little structure to exploit, or unreliable estimates of 9, both of which result 
in ineffective control. The solution would seem to be to make n large. 

Making n too large results in the same problems for the subsection estimator as 
it did for the section estimator. If n is too large, there are few controlled section 
estimates which reduces the precision of the variance estimate. More importantly, 
n is still the critical factor for the joint normality of the estimate being controlled 
and the control estimate. If n is too large, the asymptotic joint normality reduces 
the effectiveness of the linear control to that of the linear control. 

The selection of m and n for a fixed N which minimizes the bias, variance or mean 
square error of the controlled estimate is a complicated function of many parameters. 
These parameters include the value of a, the sample size A' , and unfortunately, 
because of the need to estimate 9^ characteristics of the unknown joint distribution 
of the underlying populations Y and C. An alternative to attempting to estimate 
the optimal m and ri via a functional approximation is to use graphical methods to 
assist in the selection of m and n such as in lleidelberger and Lewis (1981). In the 
experiment described below, for a given fixed sample size A\ the results of using 
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different values of n are compared graphically as well as numerically to assist in 
selecting m and n. 

6 THE SIMULATION EXPERIMENT 

6.1 The Factors 

The simulation experiment used M replications to investigate simulation pro- 
cedures for estimating the a quantile of a distribution and estimating the variance 
of the quantile estimate. The factors in the simulation experiment included the 
distribution of the underlying population of interest, the value for a, the method 
of estimating the quantile, the sample size, the choice of m and n for the sec- 
tion estimator and the choice of the m for the m-fold jackknife estimator. All of 
the computations were performed in the APL2-based statistcal computing package 
GRAFSTAT. 

6.2 The Statistic of Interest 

The distribution used in the results presented here was suggested by Hsu and 
Nelson (1987). The statistic of interest is the estimator for the a quantile of a 
random variable Y where 



and A" has a uniform (0,1) distribution and e has a uniform (0,.5) distribution and 
is independent of A. The untransformed control is the estimator of the a quantile 
of A. The value of a will be .95 for the results presented here. The true value for 
the .95 quantile of F, namely y.gs, is .164167. 

Figure 1 shows the nonlinear nature of the relationship between yai^i) and Xct{n) 
for four values of n with the sample size N fixed at 1000. Prior to plotting, the 
quantile estimates were standardized by subtracting off the sample mean of the 
quantile estimates from each estimate, and then dividing each estimate by the sample 
standard deviation of the quantile estimates. Thus the “true” values are zero. The 
quantile estimates were standardized so that one could visually assess the correlation 
between the quantile estimator of interest and the control quantile estimator. Note 
that the scales of the axes in Figure 1 change as n increases to 100, 250 and 500 as 
the ranges of the standardized quantile estimates become more concentrated about 
the true values of zero. 

For n = 25 in Figure 1, the relationship between ya{n) and Xq{7i) is highly 
nonlinear. As n increases to 100, 250 and 500 the relationship seems to become more 
linear as the number of estimates available decreases to just two at n = 500 where 
with only two pairs of estimates, the relationship must appear linear. However, one 
can see from Figure 2, where N = 6000, that even for n = 1000 the relationship 
between yain) and Xct{n) still has nonlinear tendencies. In all cases, the relationship 
appears to be one that would be well approximated by a monotone transformation. 
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Figure 1: Scatterplots illustrating the joint distribution of standardized section point es- 
timates of the .95 quantile of Y and A" for n = 25, 100, 250, and 500 from a sample of 
N = 1000 samples. Since the estimates are standardized, the true values are zero. 
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Figure 2: Scatterplots illustrating the joint distribution of of standardized section point 
estimates of the .95 quantile of V' and A' for n — 250, 500, 1000, and 1500 from a sample of 
N — 6000 samples. Since the estimates are standardized, the true values are zero. 
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6.3 The Section Estimator versus the Jackknife Estimator 



As stated previously, the section estimator was preferred over the jackknife esti- 
mator for estimating the a quantile along with an estimate of the variance (standard 
deviation) of the quantile estimator. Analytically, the section estimator of the vari- 
ance of the section estimate from (17) is an unbiased estimator and the section 
estimate of the standard deviation has 0(l/m) bias. We will graphically show the 
performance of the section estimate of the standard deviation so that the graphs 
can be compared with the performance of the jackknife estimation procedure. 

The performance of the section estimator can be seen in Figure 3. The top graph 
of Figure 3 shows a series of boxplots of section point estimates of the .95 quantile 
of Y calculated using (6). For a discussion of boxplots see Chambers et. al. (1983, 
chap. 2). The boxplots summarize the distribution of the section estimates, for 
varying n, from 300 independent replications of A’ = 1000 samples. The data under 
the graph are the sample statistics from the 300 estimates in each boxplot. The 
bottom graph consists of boxplots of section estimates of the standard deviation, 
calculated using (7), corresponding to the point estimates in the top graph, again 
with the sample statistics underneath. 

The top graph in Figure 3 shows that as n increases from 10 to 500, for a fixed 
sample size N = 1000, the bias in the section point estimates tends to decrease as 
expected. However, the top graph also shows that increasing n does not necessarily 
decrease the sample variance of the section quantile estimator because of the impact 
of decreasing the number of estimates, m, with which the section point estimate of 
the quantile is computed. 

The bottom graph of Figure 3, of the section estimates of the standard deviation 
of the section point estimate, shows another effect of increasing n. As n increases 
and m decreases, it is easy to see that the standard deviation of the estimates of 
the standard deviation also increases, from .00227 for n = 10* to .01170 for n = 500, 
so that the section estimate of the standard deviation becomes less precise. As the 
section estimate of the standard deviation has 0(l/m) bias, one would expect that 
the section estimate of the standard deviation should be closer to the estimate of the 
sample standard deviation for small n. A check of the sample standard deviation in 
the top graph against the mean of the section estimates of the standard deviation in 
the bottom graph shows that in fact the two values of .02030 and .01974 are fairly 
close at n = 10 and become farther apart as n increases. The significance of the 
difference will be examined in a moment. 

Figure 4 shows the performance of the jackknife estimator for z/cr- The top 
boxplots are the m-fold jackknife estimate of the .95 quantile of F, for varying m, 
from the same 300 independent replications of A = 1000 samples used for the section 
estimates in Figure 3. The data under the graph are the sample statistics from the 
300 estimates in each boxplot. The bottom graph in Figure 4 consists of boxplots 
of the corresponding jackknife estimates of the standard deviation of the jackknife 
point estimates in the top graph, again with the sample statistics underneath. 

The top graph in Figure 4 shows that for a fixed sample size A = 1000, the jack- 
knife estimates become highly variable as m increases, as well as having in general 
a slight positive bias {ya = .164167). The main reason for not using the jackknife 
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technique however is the poor performance of the jackknife estimate of the standard 
deviation of the point estimate. A check of the sample standard deviation in the 
top graph against the mean of the jackknife estimates of the standard deviation in 
the bottom graph shows that the two estimates of the standard deviation become 
quite far apart as m increases. For m = 2 the values are the closest, at .02202 for 
the sample standard deviation of the point estimate and .01555 for the jackknife 
estimate of the standard deviation of the point estimate 

The purpose of estimating the standard deviation of the point estimators is to 
have a measure of the precision of the point estimate. The section and jackknife 
estimators of the standard deviation of the point estimate are both trying to estimate 
the standard deviation of a sample of section or jackknife point estimates. To 
more formally assess their performance we used the data from the 300 independent 
replications previously shown in Figures 3 and 4. The procedure used for both the 
section estimates and the jackknife estimates was as follows: 

1. The point estimates from the 300 replications were sectioned into 30 inde- 
pendent sections of 10 point estimates each. The sample standard deviation 
was computed for each of the 30 sections. Thus there were 30 independent 
estimates of the sample standard deviation for both the section estimates and 
the jackknife estimates. 

2. Likewise, the 300 estimates of the standard deviation were sectioned into 30 
independent sections of 10 estimates of the standard deviation each. These 
10 standard deviation estimates were averaged to get a single estimate of the 
standard deviation for each section. Thus there were 30 independent estimates 
of the standard deviation from the estimator, for both the section estimator 
and the jackknife estimator. 

3. For each of the 30 sections^ the mean of the 10 section or jackknife estimates of 
the standard deviation from step 2 was subtracted from the sample estimate 
of the standard deviation from step 1 to yield 30 independent estimates of the 
difference. 

If the section or jackknife estimator is a reliable estimate of the sample standard 
deviation, then the difference of the sample standard deviation and the section or 
jackknife estimate of the standard deviation should be zero. 

Note that while the same data is used for all of the section and jackknife es- 
timators so that there is no independence between the different estimators, the 30 
estimates of the difference for a single estimator i.e., the section estimate with n = 25 
or the 2-fold jackknife are independent. Figure 5 has boxplots of the differences for 
both the section estimates (top graph) and the jackknife estimates (bottom graph). 

The top graph in Figure 5, of the section estimator, shows that the sample mean 
for the smaller n is within one standard error of zero. When n is increased to 250 and 
500, where the section estimates of the standard deviation are more variable because 
of the small m, the means of the differences, .00140 and .00300, are still within three 
standard errors of zero. This shows that section estimator of the standard deviation 
of the section point estimate is a reliable estimate of the sample standard deviation 
of the point estimate. 
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The Data conaists of Each Replication's Quantile Estimate 




Std Dev 0.02030 0.019B7 0.02319 0.01651 0.02025 0.01952 

Std Error 0.00117 0.00114 0.00133 0.00106 0.00118 0.00112 



The Estimated Std Deva of the Quantile Eatlmates 




Std Dev 0.00227 0.00414 0.00570 0.00600 0.00040 0.01170 

Std Error 0.00013 0.00023 0.00033 0.00034 0.00046 0.00067 



Figure 3: Boxplots of section point estimates of y 95 (top) and section estimates of the 
standard deviation of the point estimates (bottom) for 300 replications of = 1000 samples 
and varying n. 



The Data conaista of Each RepUcailon’s Quantile Estimate 




Std Error 0.00127 0.00145 0.00180 0.00266 0.00429 



The Estimated Std Devs of the Quantile Estimates 




Mean 0.01555 0.01683 0.01986 0.02052 0.02060 

Std Dev 0.01170 0.00969 0.00962 0.01150 0.01334 

Std Error 0.00007 0.00055 0.00055 0.00086 0.00077 



Figure 4: Boxplots of m-fold jackknife point estimates of y 95 (top) and m-fold jackknife 
estimates of the standard deviation of the point estimates (bottom) for 300 replications of 
N = 1000 samples and varying m. 
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The bottom graph in Figure 5 shows the opposite for the jackknife estimator. 
For no m is the mean of the differences within three standard errors of zero. If one 
tests, for each m, the normality of the differences for the jackknife estimates, one 
can not reject at the .95 confidence level the hypothesis that the differences have 
a normal distribution. For each m, the .95 confidence interval for the mean of the 
fitted normal distribution does not include zero. Thus the jackknife estimate of 
the standard deviation of a jackknifed quantile estimate is a biased and unreliable 
estimate. We feel this is strong evidence for not using the jackknife technique for 
estimating quantiles and the variance of the quantile estimate. 
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Figure 5: Boxplots of differences between estimates of the sample standard deviation of 
the point estimate and the section (top) and m-fold jackknife (bottom) estimates of the 
standard deviation of the point estimate based on 30 sections of M — 300 independent 
replications of = 1000 samples each. 



6.4 Comparing the Crude, Linearly Controlled and Nonlinearly 
Controlled Estimators 



The crude, linearly controlled and nonlinearly controlled estimators will be com- 
pared both graphically and numerically. Now the number of replications is M - 20 
and the number of samples in each replication is fixed at N = 1000. The section 
estimator will be used for all three estimators. For the nonlinearly controlled es- 
timator, the monotone transformation will be the scaled power transformation so 
that the control function will be 



y'ai^) = 
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6.4.1 Comparison When the Sample Size N = 1000 

Figure 6 shows the performance of the three estimators as triplets of boxplots for 
n = 25, 100, 250, and 500. In each of the gmphs that follow, the left boxplot of the 
triple is the crude estimate, the middle boxplot of the triple is the linearly controlled 
estimate and the right boxplot of the triple is the nonlinearly controlled estimate. 
The statistics under each graph are the respective means of the data in the boxplot 
for the crude, linearly controlled and nonlinearly controlled estimators. 

The boxplots in the top graph of Figure 6 contain the final quantile estimates 
for each of the estimators. This graph shows the effect of a control function that 
is biased because of the use of the asymptotic expected value. Without the biased 
control function each of the boxplots would look virtually the same because the 
control function would be mean zero and so would not change the expected value of 
the point estimate. The bias in the control function tends to reduce the bias of the 
point estimate with the exception of the linearly controlled estimate at n = 25. 

The boxplots in the bottom graph of Figure 6 contain the section estimates of 
the standard deviation of the point estimators. One can see that as n increases, 
the mean of the estimated standard deviation of the linearly controlled estimate 
decreases, from .01123 to .00391, while the mean of the estimated standard deviation 
for the nonlinear control increases, once n is greater than 100, from .00241 to .00374, 
until the values for the linear control and the nonlinear control are about the same. 
In fact, the estimator that minimizes the variance can be seen to be the nonlinearly 
controlled estimator at n = 100 with a value of .00241. It is also clear that when n is 
large at 250 and 500, the small m of 4 and 2 causes higher variance in the estimates 
of the standard deviation. 

The top graph in Figure 7 combines the two graphs from Figure 6, the bias and 
the variance, in that it contains the estimated mean square error of the estimators. 
It can be seen with this graph that the estimator that minimizes the mean square 
error is again the nonlinearly controlled estimator at n = 100 with a value of .00005. 
In fact the estimated mean square error for this estimator is under one-half of the 
best mean square error for the linear control of .00013 that is at n = 250. At 
n = 500 the values are the same, .00029, since there are only 2 quantile estimates 
with which to work. The other factor affecting the nonlinear control besides having 
only 2 quantile estimates to work with is that at n = 500 the joint distribution of 
the crude estimate and the control estimate is closer to multivariate normal than at 
n = 100. 

The bottom graph in Figure 7 is a summary of the percent variance reduction 
achieved by the various estimators. The percent variance reduction for each esti- 
mator is computed using the estimate of the variance of the crude estimate which 
is why the value for the crude estimator is 0. This graph again highlights the effec- 
tiveness of the nonlinearly controlled estimator at smaller n. The highest percent 
variance reduction is .97568, which is actually achieved at n = 25 and not n = 100 
because the percent variance reduction is a relative measure and the crude estimator 
at n = 25 had higher variance than the crude estimator at n = 100. This graph also 
points out the high variability of the variance reduction for large n as the number 
of quantile estimates becomes small. 
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Un Con 0.19030 0.16949 0.16763 0.16212 

Nonln C 0.16146 0.16200 0.16525 0.16176 



^(Y„; Tho Eatimated Std Devs of the Quantile Estimates 




Un Con 0.01123 0.00573 0.00397 0.00391 

Nonln C 0.00274 0.00241 0.00307 0.00374 



Figure 6 : Boxplots of section crude, linearly controlled and nonlinear controlled estimators 
showing the point quantile estimates of y 95 (top) and the estimates of the standard deviation 
of the point estimates (bottom) from M = 20 independent replications of N = 1000 for 
varying n, 

y.S.E. The EBlimated Mean-Square Error 




Un Con 0.00175 0.00021 0.00013 0.00029 

Nonln C 0.00007 0.00005 0.00011 0.00020 



% V.R. Variance Reduction Based on Estimated Std Devs 




Crude 0 0 0 0 

Un Con 0.59929 0.64540 0.64167 0.63737 

Nonln C 0.97508 0.04504 O.B5B08 0.54230 

Figure 7: Boxplots of section crude, linearly controlled and nonlinear controlled estimators 
showing the estimated mean square error (top) and percent variance reduction (bottom) 
from M = 20 independent replications of = 1000 for varying n. 
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0.4.2 Comparison When the Sample Size N = 5000 

The next pairs of graphs, Figures 8 and 9 are identical in nature to the graphs for 
N = 1000 only now the data is from estimates made from a sample size of iV = 5000. 
The number of samples used to compute each section estimate n is unchanged so 
increasing the sample size only increases m, the number of quantile estimates. The 
larger m greatly reduces the problem of high variability of the estimates caused by 
having only 2 quantile estimates with which to work at n = 500. 

In the top graph of Figure 8, increasing m has slightly improved the bias of the 
mean of the nonli nearly controlled estimates so that it is now less than the bias 
of the crude estimate for each n. At the same time the bias of the mean of the 
linearly controlled estimates has increased. A more significant impact of increasing 
m, shown in the bottom graph, is the drop in the estimated standard deviations 
for all estimators as compared to N = 1000. The variability of the estimates of the 
standard deviation has decreased as well. 

The mean square errors of the top graph in Figure 9 show again the nonlinear 
control at n = 100 does better than the best linearly controlled estimate. However, 
as n increases, one can lose the effectiveness of the nonlinear control as both the 
number of quantile estimates decreases and the quantile estimates approach multi* 
variate normality. The impact of increasing N and m from Figure 7 is seen in the 
bottom graph of Figure 9 as the variability of the estimate of the percent variance 
reduction is greatly reduced. 

7 SUMMARY 

Nonlinear controls have been seen to be effective in improving the variance reduc- 
tion over linearly controlled estimates of the mean. Sectioning is a useful procedure 
for computing point estimates for quantiles along with an estimate of the variance of 
the point estimate. The jackknife is not a useful procedure as the jackknife estimate 
of the variance of the jackknife point estimate is unreliable. Controlling quantiles 
with nonlinear controls is analytically tractable if the nonlinear transformations of 
the control quantile estimator are limited to strictly monotone functions. With this 
restriction, one can approximate the expected value of the transformed quantile es- 
timator with its asymptotic expected value, namely the transformed value of the 
true quantile for the control. The approximation induces additional bias into the 
control function. However use of a biased control function can reduce the first order 
bias in the controlled estimate. 

Finally, when one is considering the choice of m and n to use for the sectioning 
estimator, one must keep n small and avoid approaching the asymptotic multivariate 
normal distribution. As the joint distribution of the crude estimate of the quantile 
of interest and the control quantile estimate approaches multivariate normality, the 
effectiveness of the nonlinear control reduces to that of the linear control. 
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Un Con 0.20423 
Nonln C 0.16150 
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0.17666 

0.16293 
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0.16777 



0.16736 

0.16550 



The Estimated Std Devs of the Quantile Estimates 




Lin Con 0.00570 
Nonln C 0.00143 



0.00329 

0.00077 



0.00220 

0.00112 



0.00164 

0.00117 



Figure 8 : Boxplots of section crude, linearly controlled and nonlinear controlled estimators 
showing the point quantile estimates of y 95 (top) and the estimates of the standard deviation 
of the point estimates (bottom) from M = 20 independent replications of iV = 5000 for 
varying n. 



M.SE. 



The Estimated Mcaji-Square Error 




Un Con 0.00176 
Nonln C 0.00003 

% V.R. 



0.00020 

0.00001 



0.00006 

0.00003 



0.00002 

0.00001 



VcLTlance Reduction Based on Estimated Std Devs 




Crude 0 
Un Con 0.55104 
Nonln C 0.97160 



0 

0.02764 

0.98971 



0 

0.92006 

0.97660 



0 

0.95463 

0.07226 



Figure 9: Boxplots of section crude, linearly controlled and nonlinear controlled estimators 
showing the estimated mean square error (top) and percent variance reduction (bottom) 
from A/ = 20 independent replications of = 5000 for varying n. 
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